The shiny package allows to create above some R function a java-script webpage that interact with the R code and displays the results on the webpage, everthing within a web navigator. This is a good way to do some POC (proofs of concept) to validate the interest of our code before developing a real software around it.
The app.R R script contains the shiny web application, both the server and the ui.
The data provided for the development of this exercise was and .RData file called AirBnB.RData which contains data related to AirBnB listings in Paris.
We were asked to explore and analyse the Paris dataset creating a shiny application and should contain:
I consider features the following data in the dataset: * Room type * Property type * Neighborhood * Price * Type of owner (host vs superhost) * Location of the listings
According to this features, I developed the analysis of the dataset.
The first thing to do is to install the shiny package and its dependencies as well as another package to be able to use very useful tools in R: install.packages("shinyjs", dependencies=TRUE) devtools::install_github("rstudio/EDAWR")
After this, we can load all the packages that are going to be used during the project, if any of this packages was not previously installed it has to be installed following the previous steps:
library(shiny)
library(shinydashboard)
## Warning: package 'shinydashboard' was built under R version 4.0.5
##
## Attaching package: 'shinydashboard'
## The following object is masked from 'package:graphics':
##
## box
library(shinyalert)
## Warning: package 'shinyalert' was built under R version 4.0.5
##
## Attaching package: 'shinyalert'
## The following object is masked from 'package:shiny':
##
## runExample
library(shinycssloaders)
## Warning: package 'shinycssloaders' was built under R version 4.0.5
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(EDAWR)
##
## Attaching package: 'EDAWR'
## The following object is masked from 'package:dplyr':
##
## storms
library(tidyr)
## Warning: package 'tidyr' was built under R version 4.0.5
##
## Attaching package: 'tidyr'
## The following objects are masked from 'package:EDAWR':
##
## population, who
library(stringr)
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.0.3
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.4.0 v purrr 0.3.4
## v tibble 3.1.8 v forcats 0.5.0
## v readr 1.3.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(ggmap)
## i Google's Terms of Service: <https://mapsplatform.google.com>
## i Please cite ggmap if you use it! Use `citation("ggmap")` for details.
library(ggplot2)
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggmap':
##
## wind
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
library(DT)
## Warning: package 'DT' was built under R version 4.0.3
##
## Attaching package: 'DT'
## The following objects are masked from 'package:shiny':
##
## dataTableOutput, renderDataTable
library(leaflet)
## Warning: package 'leaflet' was built under R version 4.0.5
library(ggpubr)
## Warning: package 'ggpubr' was built under R version 4.0.5
library(RColorBrewer)
## Warning: package 'RColorBrewer' was built under R version 4.0.5
First, load the dataset:
load("AirBnB.RData")
After that, two lists are retrieved with names L and R, we can have a look at the first rows from each list:
head(L)
## id listing_url scrape_id last_scraped
## 1 4867396 https://www.airbnb.com/rooms/4867396 2.01607e+13 2016-07-03
## 2 7704653 https://www.airbnb.com/rooms/7704653 2.01607e+13 2016-07-04
## 3 2725029 https://www.airbnb.com/rooms/2725029 2.01607e+13 2016-07-04
## 4 9337509 https://www.airbnb.com/rooms/9337509 2.01607e+13 2016-07-03
## 5 12928158 https://www.airbnb.com/rooms/12928158 2.01607e+13 2016-07-04
## 6 5589471 https://www.airbnb.com/rooms/5589471 2.01607e+13 2016-07-04
## name
## 1 Appartement 60m2 Rue Legendre 75017
## 2 Appart au pied de l'arc de triomphe
## 3 Nice appartment in Batignolles
## 4 Charming flat near Batignolles
## 5 Spacious bedroom near the centre of Paris
## 6 Rare, Maison individuelle 200m2
## summary
## 1 Au 2ème étage d'un bel immeuble joli 2 pièces meublé comprenant: une grande pièce à vivre lumineuse, une chambre, une cuisine, salle de douche et WC séparé. Appartement très calme et lumineux. A proximité de nombreux commerces et transports.
## 2 Nous proposons cette appartement situé en plein coeur de Paris, au pied de l'arc de triomphe. Commerçants, métro, cinéma, vous trouverez à proximité tout ce qu'il faut pour passer quelques jours à Paris en amoureux, entre copains ou en famille !
## 3 Located in the very charming Batignolles, this cozy and bright two-room appartment will perfectly suit your stay in Paris.
## 4 Welcome to my apartment ! This a quiet and cosy flat with 2 room (25 sqm2) fully furnished closed to trendy Batignolles area in the heart of the 17th district. (Near Montmartre foothill / Place de Clichy).
## 5 Spacious, quiet and bright room, ideal to explore and enjoy
## 6 Maison individuelle, 200 m2 habitable,rénovée en 2013. Quartier résidentiel, nombreux commerces, restaurants. Maison familiale, pouvant accueillir 5 adultes et un enfant (1 lit en hauteur).
## space
## 1
## 2 L'appartement est composé de : - une grande chambre (environ 15m2) avec un lit simple et d'un matelas d'appoint - une salle de bain avec douche, lave linge/sèche linge - un autre chambre (environ 10m2) avec un lit double (lit gigogne) et une salle de bain dans la chambre (douche) - un grand salon avec une cuisine ouverte (environ 35 m2) - wc séparé Le cuisine est tout équipé : machine nespresso, cocotte-minute, mixeur, lave vaisselle... L'appartement est très lumineux puisqu'il donne sur une avenue large mais calme. Vous trouverez à proximité plein de commercants, de bar pour sortir, de restaurants, des cinémas, des musées. Vous serez au coeur de la ville ! N'hésitez pas à nous contacter pour plus d'information, de photos...
## 3
## 4
## 5
## 6
## description
## 1 Au 2ème étage d'un bel immeuble joli 2 pièces meublé comprenant: une grande pièce à vivre lumineuse, une chambre, une cuisine, salle de douche et WC séparé. Appartement très calme et lumineux. A proximité de nombreux commerces et transports.
## 2 Nous proposons cette appartement situé en plein coeur de Paris, au pied de l'arc de triomphe. Commerçants, métro, cinéma, vous trouverez à proximité tout ce qu'il faut pour passer quelques jours à Paris en amoureux, entre copains ou en famille ! L'appartement est composé de : - une grande chambre (environ 15m2) avec un lit simple et d'un matelas d'appoint - une salle de bain avec douche, lave linge/sèche linge - un autre chambre (environ 10m2) avec un lit double (lit gigogne) et une salle de bain dans la chambre (douche) - un grand salon avec une cuisine ouverte (environ 35 m2) - wc séparé Le cuisine est tout équipé : machine nespresso, cocotte-minute, mixeur, lave vaisselle... L'appartement est très lumineux puisqu'il donne sur une avenue large mais calme. Vous trouverez à proximité plein de commercants, de bar pour sortir, de restaurants, des cinémas, des musées. Vous serez au coeur de la ville ! N'hésitez pas à nous contacter pour plus d'information, de photos...
## 3 Located in the very charming Batignolles, this cozy and bright two-room appartment will perfectly suit your stay in Paris.
## 4 Welcome to my apartment ! This a quiet and cosy flat with 2 room (25 sqm2) fully furnished closed to trendy Batignolles area in the heart of the 17th district. (Near Montmartre foothill / Place de Clichy).
## 5 Spacious, quiet and bright room, ideal to explore and enjoy
## 6 Maison individuelle, 200 m2 habitable,rénovée en 2013. Quartier résidentiel, nombreux commerces, restaurants. Maison familiale, pouvant accueillir 5 adultes et un enfant (1 lit en hauteur).
## experiences_offered neighborhood_overview notes transit access interaction
## 1 none
## 2 none
## 3 none
## 4 none
## 5 none
## 6 none
## house_rules
## 1
## 2
## 3
## 4
## 5
## 6
## thumbnail_url
## 1
## 2 https://a1.muscache.com/im/pictures/97911969/ef37b496_original.jpg?aki_policy=small
## 3
## 4
## 5 https://a2.muscache.com/im/pictures/df47511b-0e86-4dcb-9887-569489b16020.jpg?aki_policy=small
## 6
## medium_url
## 1
## 2 https://a1.muscache.com/im/pictures/97911969/ef37b496_original.jpg?aki_policy=medium
## 3
## 4
## 5 https://a2.muscache.com/im/pictures/df47511b-0e86-4dcb-9887-569489b16020.jpg?aki_policy=medium
## 6
## picture_url
## 1 https://a1.muscache.com/im/pictures/61090424/02c8a8bb_original.jpg?aki_policy=large
## 2 https://a1.muscache.com/im/pictures/97911969/ef37b496_original.jpg?aki_policy=large
## 3 https://a1.muscache.com/im/pictures/96821426/ea9864f1_original.jpg?aki_policy=large
## 4 https://a2.muscache.com/im/pictures/5fa65f2d-b159-4fb5-986a-bd36cb92d2bc.jpg?aki_policy=large
## 5 https://a2.muscache.com/im/pictures/df47511b-0e86-4dcb-9887-569489b16020.jpg?aki_policy=large
## 6 https://a2.muscache.com/im/pictures/69589240/79d976c4_original.jpg?aki_policy=large
## xl_picture_url
## 1
## 2 https://a1.muscache.com/im/pictures/97911969/ef37b496_original.jpg?aki_policy=x_large
## 3
## 4
## 5 https://a2.muscache.com/im/pictures/df47511b-0e86-4dcb-9887-569489b16020.jpg?aki_policy=x_large
## 6
## host_id host_url host_name host_since
## 1 9703910 https://www.airbnb.com/users/show/9703910 Matthieu 2013-10-29
## 2 35777602 https://www.airbnb.com/users/show/35777602 Claire 2015-06-14
## 3 13945253 https://www.airbnb.com/users/show/13945253 Vincent 2014-04-06
## 4 5107123 https://www.airbnb.com/users/show/5107123 Julie 2013-02-16
## 5 51195601 https://www.airbnb.com/users/show/51195601 Daniele 2015-12-13
## 6 28980052 https://www.airbnb.com/users/show/28980052 Philippe 2015-03-08
## host_location
## 1 Nantes, Pays de la Loire, France
## 2 Paris, ÃŽle-de-France, France
## 3 Paris, ÃŽle-de-France, France
## 4 Paris, ÃŽle-de-France, France
## 5 Prato, Toscana, Italy
## 6 Paris, ÃŽle-de-France, France
## host_about
## 1
## 2
## 3
## 4 Nous sommes un jeune couple vivant à Paris. Nous aimons beaucoup voyager
## 5
## 6
## host_response_time host_response_rate host_acceptance_rate host_is_superhost
## 1 N/A N/A N/A f
## 2 N/A N/A N/A f
## 3 within an hour 100% N/A f
## 4 within a day 50% N/A f
## 5 within an hour 100% 60% f
## 6 N/A N/A N/A f
## host_thumbnail_url
## 1 https://a0.muscache.com/im/users/9703910/profile_pic/1383073563/original.jpg?aki_policy=profile_small
## 2 https://a1.muscache.com/im/users/35777602/profile_pic/1438688930/original.jpg?aki_policy=profile_small
## 3 https://a0.muscache.com/im/users/13945253/profile_pic/1396781528/original.jpg?aki_policy=profile_small
## 4 https://a1.muscache.com/im/users/5107123/profile_pic/1425849895/original.jpg?aki_policy=profile_small
## 5 https://a2.muscache.com/im/pictures/e984ba68-7571-46d9-99dc-735ec6e5c9d6.jpg?aki_policy=profile_small
## 6 https://a0.muscache.com/im/users/28980052/profile_pic/1425844331/original.jpg?aki_policy=profile_small
## host_picture_url
## 1 https://a0.muscache.com/im/users/9703910/profile_pic/1383073563/original.jpg?aki_policy=profile_x_medium
## 2 https://a1.muscache.com/im/users/35777602/profile_pic/1438688930/original.jpg?aki_policy=profile_x_medium
## 3 https://a0.muscache.com/im/users/13945253/profile_pic/1396781528/original.jpg?aki_policy=profile_x_medium
## 4 https://a1.muscache.com/im/users/5107123/profile_pic/1425849895/original.jpg?aki_policy=profile_x_medium
## 5 https://a2.muscache.com/im/pictures/e984ba68-7571-46d9-99dc-735ec6e5c9d6.jpg?aki_policy=profile_x_medium
## 6 https://a0.muscache.com/im/users/28980052/profile_pic/1425844331/original.jpg?aki_policy=profile_x_medium
## host_neighbourhood host_listings_count host_total_listings_count
## 1 Batignolles 1 1
## 2 Champs-Elysées 1 1
## 3 Batignolles 1 1
## 4 Batignolles 1 1
## 5 Ternes 1 1
## 6 Batignolles 1 1
## host_verifications host_has_profile_pic
## 1 ['email', 'phone', 'reviews'] t
## 2 ['email', 'phone', 'reviews'] t
## 3 ['email', 'phone', 'reviews'] t
## 4 ['email', 'phone', 'reviews', 'jumio'] t
## 5 ['email', 'phone', 'reviews', 'jumio'] t
## 6 ['email', 'phone'] t
## host_identity_verified
## 1 f
## 2 f
## 3 f
## 4 t
## 5 t
## 6 f
## street neighbourhood
## 1 Rue Legendre, Paris, ÃŽle-de-France 75017, France Batignolles
## 2 Avenue Mac-Mahon, Paris, Île-de-France 75017, France Champs-Elysées
## 3 Rue la Condamine, Paris, ÃŽle-de-France 75017, France Batignolles
## 4 Rue Gauthey, Paris, ÃŽle-de-France 75017, France Batignolles
## 5 Avenue Brunetière, Paris, Île-de-France 75017, France Ternes
## 6 Rue de Saussure, Paris, ÃŽle-de-France 75017, France Batignolles
## neighbourhood_cleansed neighbourhood_group_cleansed city state
## 1 Batignolles-Monceau NA Paris ÃŽle-de-France
## 2 Batignolles-Monceau NA Paris ÃŽle-de-France
## 3 Batignolles-Monceau NA Paris ÃŽle-de-France
## 4 Batignolles-Monceau NA Paris ÃŽle-de-France
## 5 Batignolles-Monceau NA Paris ÃŽle-de-France
## 6 Batignolles-Monceau NA Paris ÃŽle-de-France
## zipcode market smart_location country_code country latitude longitude
## 1 75017 Paris Paris, France FR France 48.88880 2.320466
## 2 75017 Paris Paris, France FR France 48.87664 2.293724
## 3 75017 Paris Paris, France FR France 48.88384 2.321031
## 4 75017 Paris Paris, France FR France 48.89236 2.322338
## 5 75017 Paris Paris, France FR France 48.88942 2.298321
## 6 75017 Paris Paris, France FR France 48.88707 2.312212
## is_location_exact property_type room_type accommodates bathrooms
## 1 t Apartment Entire home/apt 2 1
## 2 t Apartment Entire home/apt 4 2
## 3 t Apartment Entire home/apt 2 1
## 4 t Apartment Entire home/apt 2 1
## 5 t Apartment Private room 2 1
## 6 t House Entire home/apt 6 3
## bedrooms beds bed_type
## 1 1 1 Real Bed
## 2 2 3 Real Bed
## 3 1 1 Real Bed
## 4 1 1 Real Bed
## 5 1 1 Real Bed
## 6 4 4 Real Bed
## amenities
## 1 {TV,"Cable TV",Internet,"Wireless Internet",Kitchen,Heating,Washer,Dryer,Essentials}
## 2 {"Wireless Internet",Kitchen,"Elevator in Building","Buzzer/Wireless Intercom",Washer,Dryer,Essentials}
## 3 {TV,Internet,"Wireless Internet",Kitchen,"Indoor Fireplace",Heating,"Family/Kid Friendly",Washer,Essentials,Shampoo}
## 4 {"Wireless Internet",Kitchen,Heating,Washer,Essentials}
## 5 {"Wireless Internet",Kitchen,"Smoking Allowed","Pets Allowed",Breakfast,"Elevator in Building",Heating,"Family/Kid Friendly",Washer,Dryer,Essentials,Shampoo}
## 6 {TV,Internet,"Wireless Internet",Kitchen,Heating,"Family/Kid Friendly",Washer,Dryer,"Smoke Detector","Fire Extinguisher",Essentials}
## square_feet price weekly_price monthly_price security_deposit cleaning_fee
## 1 NA $60.00 $388.00 $200.00 $20.00
## 2 NA $200.00
## 3 NA $80.00 $501.00 $1,503.00 $501.00
## 4 NA $60.00 $250.00
## 5 NA $50.00
## 6 NA $191.00 $50.00
## guests_included extra_people minimum_nights maximum_nights calendar_updated
## 1 1 $0.00 1 1125 5 months ago
## 2 1 $0.00 1 1125 11 months ago
## 3 1 $0.00 3 1125 today
## 4 0 $0.00 2 1125 8 months ago
## 5 1 $0.00 1 30 4 weeks ago
## 6 1 $0.00 3 1125 5 months ago
## has_availability availability_30 availability_60 availability_90
## 1 NA 0 0 0
## 2 NA 0 0 0
## 3 NA 6 23 23
## 4 NA 29 59 89
## 5 NA 29 59 89
## 6 NA 0 0 0
## availability_365 calendar_last_scraped number_of_reviews first_review
## 1 0 2016-07-03 1 2015-05-19
## 2 0 2016-07-04 0
## 3 298 2016-07-04 1 2015-10-10
## 4 364 2016-07-03 1 2015-12-15
## 5 89 2016-07-04 2 2016-06-17
## 6 0 2016-07-04 0
## last_review review_scores_rating review_scores_accuracy
## 1 2015-05-19 100 10
## 2 NA NA
## 3 2015-10-10 80 NA
## 4 2015-12-15 80 6
## 5 2016-06-17 100 10
## 6 NA NA
## review_scores_cleanliness review_scores_checkin review_scores_communication
## 1 10 10 10
## 2 NA NA NA
## 3 NA NA NA
## 4 10 8 10
## 5 10 10 10
## 6 NA NA NA
## review_scores_location review_scores_value requires_license license
## 1 10 10 f
## 2 NA NA f
## 3 NA NA f
## 4 6 8 f
## 5 10 10 f
## 6 NA NA f
## jurisdiction_names instant_bookable cancellation_policy
## 1 Paris f flexible
## 2 Paris f flexible
## 3 Paris f flexible
## 4 Paris f flexible
## 5 Paris f flexible
## 6 Paris f flexible
## require_guest_profile_picture require_guest_phone_verification
## 1 f f
## 2 f f
## 3 f f
## 4 f f
## 5 f f
## 6 f f
## calculated_host_listings_count reviews_per_month
## 1 1 0.07
## 2 1 NA
## 3 1 0.11
## 4 1 0.15
## 5 1 2.00
## 6 1 NA
head(R)
## listing_id date
## 1 12007141 2016-04-16
## 2 12007141 2016-04-26
## 3 12007141 2016-05-03
## 4 12007141 2016-06-15
## 5 6666099 2015-06-21
## 6 6666099 2015-07-27
We observe the following:
The list L contains 95 variables of different types.
The list R contains only two variables.
The L list will be used to analyse the features and the R list will be used to compute the visit frequency of the different quarters according to time.
Using the select clause, a subset of the L dataset is created to use only the variables (out of the 95) that will be useful for the project:
data <- select(L, listing_id = id, host_id, host_name, bathrooms, bedrooms,
beds, bed_type, equipments= amenities, type= property_type, room= room_type,
nb_of_guests= accommodates, price, guests_included, minimum_nights,
maximum_nights,availability_over_one_year= availability_365, instant_bookable,
cancellation_policy, city, address= street, neighbourhood=neighbourhood_cleansed,
city_quarter=zipcode, latitude, longitude, security_deposit, transit,
host_response_time, superhost= host_is_superhost, host_since,
listing_count= calculated_host_listings_count, host_score= review_scores_rating,
reviews_per_month, number_of_reviews)
head(data)
## listing_id host_id host_name bathrooms bedrooms beds bed_type
## 1 4867396 9703910 Matthieu 1 1 1 Real Bed
## 2 7704653 35777602 Claire 2 2 3 Real Bed
## 3 2725029 13945253 Vincent 1 1 1 Real Bed
## 4 9337509 5107123 Julie 1 1 1 Real Bed
## 5 12928158 51195601 Daniele 1 1 1 Real Bed
## 6 5589471 28980052 Philippe 3 4 4 Real Bed
## equipments
## 1 {TV,"Cable TV",Internet,"Wireless Internet",Kitchen,Heating,Washer,Dryer,Essentials}
## 2 {"Wireless Internet",Kitchen,"Elevator in Building","Buzzer/Wireless Intercom",Washer,Dryer,Essentials}
## 3 {TV,Internet,"Wireless Internet",Kitchen,"Indoor Fireplace",Heating,"Family/Kid Friendly",Washer,Essentials,Shampoo}
## 4 {"Wireless Internet",Kitchen,Heating,Washer,Essentials}
## 5 {"Wireless Internet",Kitchen,"Smoking Allowed","Pets Allowed",Breakfast,"Elevator in Building",Heating,"Family/Kid Friendly",Washer,Dryer,Essentials,Shampoo}
## 6 {TV,Internet,"Wireless Internet",Kitchen,Heating,"Family/Kid Friendly",Washer,Dryer,"Smoke Detector","Fire Extinguisher",Essentials}
## type room nb_of_guests price guests_included minimum_nights
## 1 Apartment Entire home/apt 2 $60.00 1 1
## 2 Apartment Entire home/apt 4 $200.00 1 1
## 3 Apartment Entire home/apt 2 $80.00 1 3
## 4 Apartment Entire home/apt 2 $60.00 0 2
## 5 Apartment Private room 2 $50.00 1 1
## 6 House Entire home/apt 6 $191.00 1 3
## maximum_nights availability_over_one_year instant_bookable
## 1 1125 0 f
## 2 1125 0 f
## 3 1125 298 f
## 4 1125 364 f
## 5 30 89 f
## 6 1125 0 f
## cancellation_policy city
## 1 flexible Paris
## 2 flexible Paris
## 3 flexible Paris
## 4 flexible Paris
## 5 flexible Paris
## 6 flexible Paris
## address neighbourhood
## 1 Rue Legendre, Paris, ÃŽle-de-France 75017, France Batignolles-Monceau
## 2 Avenue Mac-Mahon, Paris, ÃŽle-de-France 75017, France Batignolles-Monceau
## 3 Rue la Condamine, Paris, ÃŽle-de-France 75017, France Batignolles-Monceau
## 4 Rue Gauthey, Paris, ÃŽle-de-France 75017, France Batignolles-Monceau
## 5 Avenue Brunetière, Paris, Île-de-France 75017, France Batignolles-Monceau
## 6 Rue de Saussure, Paris, ÃŽle-de-France 75017, France Batignolles-Monceau
## city_quarter latitude longitude security_deposit transit host_response_time
## 1 75017 48.88880 2.320466 $200.00 N/A
## 2 75017 48.87664 2.293724 N/A
## 3 75017 48.88384 2.321031 $501.00 within an hour
## 4 75017 48.89236 2.322338 $250.00 within a day
## 5 75017 48.88942 2.298321 within an hour
## 6 75017 48.88707 2.312212 N/A
## superhost host_since listing_count host_score reviews_per_month
## 1 f 2013-10-29 1 100 0.07
## 2 f 2015-06-14 1 NA NA
## 3 f 2014-04-06 1 80 0.11
## 4 f 2013-02-16 1 80 0.15
## 5 f 2015-12-13 1 100 2.00
## 6 f 2015-03-08 1 NA NA
## number_of_reviews
## 1 1
## 2 0
## 3 1
## 4 1
## 5 2
## 6 0
As part of the cleaning of the dataset, duplicate data needs to be removed:
data %>% distinct(listing_id, .keep_all = TRUE)
Also, the $ sign in the prices will give us problem when manipulating the numbers so it needs to be removed as well:
data$price <- substring(gsub(",", "", as.character(data$price)),2)
Finally, we need to ensure that the the variables have the appropriate data type:
Converting numeric columns:
data$bathrooms <- as.numeric((data$bathrooms))
data$bedrooms <- as.numeric((data$bedrooms))
data$beds <- as.numeric((data$beds))
data$price <- as.numeric((data$price))
data$guests_included <- as.numeric((data$guests_included))
data$minimum_nights <- as.numeric((data$minimum_nights))
data$maximum_nights <- as.numeric((data$maximum_nights))
data$availability_over_one_year <- as.numeric((data$availability_over_one_year))
data$security_deposit <- as.numeric((data$security_deposit))
data$listing_count <- as.numeric((data$listing_count))
data$host_score <- as.numeric((data$host_score))
data$reviews_per_month <- as.numeric((data$reviews_per_month))
data$number_of_reviews <- as.numeric((data$number_of_reviews))
Converting character columns:
data$neighbourhood <- as.character(data$neighbourhood)
Some neighborhood names have encoding issues, we can rewrite them correctly:
data[data == "Panthéon"] <- "Panthéon"
data[data == "Opéra"] <- "Opéra"
data[data == "Entrepôt"] <- "Entrepôt"
data[data == "Élysée"] <- "Elysée"
data[data == "Ménilmontant"] <- "Mesnilmontant"
data[data == "Hôtel-de-Ville"] <- "Hôtel-de-Ville"
Notice that there are missing values for some columns. The approach followed in this case is to fill the missing values with the mean value of the corresponding column (bathrooms, bedrooms and beds):
temp = mean(data$bathrooms, na.rm = TRUE)
val = is.na(data$bathrooms)
data$bathrooms[val] = temp
temp = mean(data$bedrooms, na.rm = TRUE)
val = is.na(data$bedrooms)
data$bedrooms[val] = temp
temp = mean(data$beds, na.rm = TRUE)
val = is.na(data$beds)
data$beds[val] = temp
The data is now cleaned, let’s have a look at the first rows of our new dataset:
head(data)
## listing_id host_id host_name bathrooms bedrooms beds bed_type
## 1 4867396 9703910 Matthieu 1 1 1 Real Bed
## 2 7704653 35777602 Claire 2 2 3 Real Bed
## 3 2725029 13945253 Vincent 1 1 1 Real Bed
## 4 9337509 5107123 Julie 1 1 1 Real Bed
## 5 12928158 51195601 Daniele 1 1 1 Real Bed
## 6 5589471 28980052 Philippe 3 4 4 Real Bed
## equipments
## 1 {TV,"Cable TV",Internet,"Wireless Internet",Kitchen,Heating,Washer,Dryer,Essentials}
## 2 {"Wireless Internet",Kitchen,"Elevator in Building","Buzzer/Wireless Intercom",Washer,Dryer,Essentials}
## 3 {TV,Internet,"Wireless Internet",Kitchen,"Indoor Fireplace",Heating,"Family/Kid Friendly",Washer,Essentials,Shampoo}
## 4 {"Wireless Internet",Kitchen,Heating,Washer,Essentials}
## 5 {"Wireless Internet",Kitchen,"Smoking Allowed","Pets Allowed",Breakfast,"Elevator in Building",Heating,"Family/Kid Friendly",Washer,Dryer,Essentials,Shampoo}
## 6 {TV,Internet,"Wireless Internet",Kitchen,Heating,"Family/Kid Friendly",Washer,Dryer,"Smoke Detector","Fire Extinguisher",Essentials}
## type room nb_of_guests price guests_included minimum_nights
## 1 Apartment Entire home/apt 2 60 1 1
## 2 Apartment Entire home/apt 4 200 1 1
## 3 Apartment Entire home/apt 2 80 1 3
## 4 Apartment Entire home/apt 2 60 0 2
## 5 Apartment Private room 2 50 1 1
## 6 House Entire home/apt 6 191 1 3
## maximum_nights availability_over_one_year instant_bookable
## 1 1125 0 f
## 2 1125 0 f
## 3 1125 298 f
## 4 1125 364 f
## 5 30 89 f
## 6 1125 0 f
## cancellation_policy city
## 1 flexible Paris
## 2 flexible Paris
## 3 flexible Paris
## 4 flexible Paris
## 5 flexible Paris
## 6 flexible Paris
## address neighbourhood
## 1 Rue Legendre, Paris, ÃŽle-de-France 75017, France Batignolles-Monceau
## 2 Avenue Mac-Mahon, Paris, ÃŽle-de-France 75017, France Batignolles-Monceau
## 3 Rue la Condamine, Paris, ÃŽle-de-France 75017, France Batignolles-Monceau
## 4 Rue Gauthey, Paris, ÃŽle-de-France 75017, France Batignolles-Monceau
## 5 Avenue Brunetière, Paris, Île-de-France 75017, France Batignolles-Monceau
## 6 Rue de Saussure, Paris, ÃŽle-de-France 75017, France Batignolles-Monceau
## city_quarter latitude longitude security_deposit transit host_response_time
## 1 75017 48.88880 2.320466 94 N/A
## 2 75017 48.87664 2.293724 1 N/A
## 3 75017 48.88384 2.321031 208 within an hour
## 4 75017 48.89236 2.322338 106 within a day
## 5 75017 48.88942 2.298321 1 within an hour
## 6 75017 48.88707 2.312212 1 N/A
## superhost host_since listing_count host_score reviews_per_month
## 1 f 2013-10-29 1 100 0.07
## 2 f 2015-06-14 1 NA NA
## 3 f 2014-04-06 1 80 0.11
## 4 f 2013-02-16 1 80 0.15
## 5 f 2015-12-13 1 100 2.00
## 6 f 2015-03-08 1 NA NA
## number_of_reviews
## 1 1
## 2 0
## 3 1
## 4 1
## 5 2
## 6 0
And also the summary:
summary(data)
## listing_id host_id host_name bathrooms
## Min. : 2623 Min. : 2626 Marie : 583 Min. :0.00
## 1st Qu.: 3470301 1st Qu.: 6158190 Nicolas : 436 1st Qu.:1.00
## Median : 6965852 Median :15885410 Pierre : 418 Median :1.00
## Mean : 7069608 Mean :22485601 Caroline: 388 Mean :1.09
## 3rd Qu.:10740059 3rd Qu.:34348717 Anne : 387 3rd Qu.:1.00
## Max. :13819560 Max. :81397049 Sophie : 372 Max. :8.00
## (Other) :50141
## bedrooms beds bed_type
## Min. : 0.000 Min. : 0.000 Airbed : 35
## 1st Qu.: 1.000 1st Qu.: 1.000 Couch : 1182
## Median : 1.000 Median : 1.000 Futon : 449
## Mean : 1.059 Mean : 1.684 Pull-out Sofa: 5066
## 3rd Qu.: 1.000 3rd Qu.: 2.000 Real Bed :45993
## Max. :10.000 Max. :16.000
##
## equipments
## {} : 552
## {TV,Internet,"Wireless Internet",Kitchen,Heating,Washer,Essentials} : 95
## {Internet,"Wireless Internet",Kitchen,Heating,Washer,Essentials} : 90
## {Internet,"Wireless Internet",Kitchen,Heating,Essentials} : 68
## {TV,"Cable TV",Internet,"Wireless Internet",Kitchen,Heating,Washer,Essentials}: 64
## {TV,"Cable TV",Internet,"Wireless Internet",Kitchen,Heating,Washer} : 64
## (Other) :51792
## type room nb_of_guests
## Apartment :50663 Entire home/apt:45177 Min. : 1.000
## Loft : 567 Private room : 7001 1st Qu.: 2.000
## House : 537 Shared room : 547 Median : 2.000
## Bed & Breakfast: 394 Mean : 3.051
## Condominium : 266 3rd Qu.: 4.000
## Other : 122 Max. :16.000
## (Other) : 176
## price guests_included minimum_nights maximum_nights
## Min. : 0.00 Min. : 0.000 Min. : 1.000 Min. :1.000e+00
## 1st Qu.: 55.00 1st Qu.: 1.000 1st Qu.: 1.000 1st Qu.:6.000e+01
## Median : 75.00 Median : 1.000 Median : 2.000 Median :1.125e+03
## Mean : 96.51 Mean : 1.353 Mean : 3.128 Mean :1.253e+05
## 3rd Qu.: 110.00 3rd Qu.: 2.000 3rd Qu.: 3.000 3rd Qu.:1.125e+03
## Max. :6081.00 Max. :16.000 Max. :1000.000 Max. :2.147e+09
##
## availability_over_one_year instant_bookable cancellation_policy
## Min. : 0.0 f:44186 flexible :19244
## 1st Qu.: 22.0 t: 8539 moderate :15039
## Median :183.0 strict :18427
## Mean :179.5 super_strict_30: 6
## 3rd Qu.:336.0 super_strict_60: 9
## Max. :365.0
##
## city
## Paris :50825
## Paris-15E-Arrondissement: 115
## Paris-19E-Arrondissement: 106
## Paris-20E-Arrondissement: 87
## Paris-18E-Arrondissement: 77
## Paris-16E-Arrondissement: 76
## (Other) : 1439
## address
## Paris, ÃŽle-de-France, France : 308
## Boulevard Voltaire, Paris, ÃŽle-de-France 75011, France : 209
## Rue du Faubourg Saint-Martin, Paris, ÃŽle-de-France 75010, France: 202
## Rue Oberkampf, Paris, ÃŽle-de-France 75011, France : 202
## Rue Saint-Maur, Paris, ÃŽle-de-France 75011, France : 196
## Rue de Charenton, Paris, ÃŽle-de-France 75012, France : 188
## (Other) :51420
## neighbourhood city_quarter latitude longitude
## Length:52725 75018 : 5973 Min. :48.81 Min. :2.221
## Class :character 75011 : 4825 1st Qu.:48.85 1st Qu.:2.323
## Mode :character 75015 : 3799 Median :48.86 Median :2.347
## 75010 : 3511 Mean :48.86 Mean :2.344
## 75017 : 3465 3rd Qu.:48.88 3rd Qu.:2.369
## 75020 : 2859 Max. :48.91 Max. :2.475
## (Other):28293
## security_deposit
## Min. : 1.00
## 1st Qu.: 1.00
## Median : 58.00
## Mean : 81.57
## 3rd Qu.:129.00
## Max. :304.00
##
## transit
## :18546
## Public transportation is a bit of a maze in Paris. I recommend you to book a transfer on the app Bonjour Paris (G00gle or Apple store). : 16
## DIRECT ACCESS From Airport CDG (Charles de Gaule-Roissy) DIRECT ACCESS From Airport ORLY EASY & FAST ACCESS from TRAIN STATIONS METRO Station Saint Michel line 4 is 3 minutes by foot from my place RER Station Saint Michel line B is 3 minutes by foot from my place TAXI STATION is 3 minutes by foot from my place By CAR : 2 choices of PARKING both 5 minutes by foot from my place : â\200œParking Saint Michelâ\200\235 Rue Francisque Gay n°46 and â\200œParking Notre Dameâ\200\235 Place Jean Paul II: 12
## Subway: Châtelet (lines 1, 4, 7, 11 & 14, RER A, B & D) : 12
## Odéon station line 4 and 10 Saint Michel station line 4, RER B and RER C : 10
## (Other) :34128
## NA's : 1
## host_response_time superhost host_since listing_count
## : 46 : 46 2012-05-04: 166 Min. : 1.000
## a few days or more: 996 f:50513 2012-06-18: 165 1st Qu.: 1.000
## N/A :12517 t: 2166 2012-10-25: 155 Median : 1.000
## within a day :10201 2014-03-10: 135 Mean : 4.087
## within a few hours:13926 2015-07-29: 128 3rd Qu.: 1.000
## within an hour :15039 2013-07-20: 116 Max. :155.000
## (Other) :51860
## host_score reviews_per_month number_of_reviews
## Min. : 20.00 Min. : 0.010 Min. : 0.00
## 1st Qu.: 87.00 1st Qu.: 0.360 1st Qu.: 0.00
## Median : 93.00 Median : 0.900 Median : 3.00
## Mean : 91.01 Mean : 1.336 Mean : 12.59
## 3rd Qu.: 97.00 3rd Qu.: 1.870 3rd Qu.: 13.00
## Max. :100.00 Max. :14.290 Max. :392.00
## NA's :15454 NA's :14508
summary(data$price)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 55.00 75.00 96.51 110.00 6081.00
p1<- ggplot(data) +
geom_histogram(aes(price), fill = 'blue', alpha = 0.85, binwidth = 15) +
theme_minimal(base_size = 13) +
xlab("Price") +
ylab("Frequency") +
ggtitle("Distribution of Price")
p2 <- ggplot(data, aes(price)) +
geom_histogram(bins = 30, aes(y = ..density..), fill = "blue") +
geom_density(alpha = 0.2, fill = "blue") +
ggtitle("Logarithmic distribution of Price", subtitle = expression("With" ~'log'[10] ~ "transformation of x-axis")) +
scale_x_log10()
ggarrange(p1,
p2,
nrow = 1,
ncol=2,
labels = c("1. ", "2. "))
## Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
## Warning: Please use `after_stat(density)` instead.
## Warning: Transformation introduced infinite values in continuous x-axis
## Warning: Transformation introduced infinite values in continuous x-axis
## Warning: Removed 2 rows containing non-finite values (`stat_bin()`).
## Warning: Removed 2 rows containing non-finite values (`stat_density()`).
In the logarithmic distribution of the variable price a better insight view of this variable can be perceived.
data %>%distinct(type)
## type
## 1 Apartment
## 2 House
## 3 Condominium
## 4 Loft
## 5 Other
## 6 Bed & Breakfast
## 7
## 8 Dorm
## 9 Townhouse
## 10 Boat
## 11 Villa
## 12 Tent
## 13 Cabin
## 14 Tipi
## 15 Camper/RV
## 16 Cave
## 17 Chalet
## 18 Treehouse
## 19 Earth House
## 20 Igloo
Listing types according to the property types:
property_type_count <- table(data$type)
property_types_counts <- table(data$type,exclude=names(property_type_count[property_type_count[] < 4000]))
others <- sum(as.vector(property_type_count[property_type_count[] < 4000]))
property_types_counts['Others'] <- others
property_types <- names(property_types_counts)
counts <- as.vector(property_types_counts)
percentages <- scales::percent(round(counts/sum(counts), 2))
property_types_percentages <- sprintf("%s (%s)", property_types, percentages)
property_types_counts_df <- data.frame(group = property_types, value = counts)
res1 <- ggplot(property_types_counts_df, aes(x="",y=value, fill=property_types_percentages)) +
geom_bar(width = 1,stat = "identity") +
coord_polar("y",start = 0) +
scale_fill_brewer("Property Types",palette = "BuPu")+
ggtitle("Listings according to property types") +
theme(plot.title = element_text(color = "Black", size = 12, hjust = 0.5))+
ylab("") +
xlab("") +
theme(axis.ticks = element_blank(), panel.grid = element_blank(), axis.text = element_blank()) +
geom_text(aes(label = percentages), size= 4, position = position_stack(vjust = 0.5))
res1
96% of the listings are of type apartment.
Distribution of the price for each property type:
ggplot(data) +
geom_boxplot(aes(x = type,y = price,fill = type)) +
labs(x = "Property Type",y = "Price",fill = "Property Type") +
coord_flip()
We can see that some property types are more expensive than the average, this property types are: Villa, Townhouse, House and Camper/RV. Since in the dataset the 96% of the listings are of type apartment, less than 4% lays in those property types.
data %>%distinct(room)
## room
## 1 Entire home/apt
## 2 Private room
## 3 Shared room
Listing types according to the room type:
room_types_counts <- table(data$room)
room_types <- names(room_types_counts)
counts <- as.vector(room_types_counts)
percentages <- scales::percent(round(counts/sum(counts), 2))
room_types_percentages <- sprintf("%s (%s)", room_types, percentages)
room_types_counts_df <- data.frame(group = room_types, value = counts)
res2 <- ggplot(room_types_counts_df, aes(x = "", y = value, fill = room_types_percentages)) +
geom_bar(width = 1, stat = "identity") +
coord_polar("y", start = 0) +
scale_fill_brewer("Room Types", palette = "BuPu") +
ggtitle("Listing types according to Room types") +
theme(plot.title = element_text(color = "black", size = 12, hjust = 0.5)) +
ylab("") +
xlab("") +
labs(fill="") +
theme(axis.ticks = element_blank(), panel.grid = element_blank(), axis.text = element_blank()) +
geom_text(aes(label = percentages), size = 5, position = position_stack(vjust = 0.5))
res2
There exists three types of rooms: Entire home/apt, Private room and Shared room. Among those, 86% of the listings are entire apartments.
Price by room type:
ggplot(data)+
geom_boxplot(aes(x = room,y = price, fill = room)) +
labs(x = "Room Type", y = "Price", fill = "Room Type")+
coord_flip()
The price increases in this order: shared room > private room > entire home/apt. Let’s have a look at the average price by room type:
data %>%
group_by(room) %>%
summarise(mean_price = mean(price, na.rm = TRUE)) %>%
ggplot(aes(x = reorder(room, mean_price), y = mean_price, fill = room)) +
geom_col(stat ="identity", fill="Blue") +
coord_flip() +
theme_minimal() +
labs(x = "Room Type", y = "Price") +
geom_text(aes(label = round(mean_price,digit = 2)), hjust = 1.0, color = "white", size = 4.5) +
ggtitle("Mean Price / Room Types") +
xlab("Room Type") +
ylab("Mean Price")
## `summarise()` ungrouping output (override with `.groups` argument)
## Warning in geom_col(stat = "identity", fill = "Blue"): Ignoring unknown
## parameters: `stat`
price_cancellation_policy <- ggplot(data = data,
aes(x = cancellation_policy, y = price, color=cancellation_policy)) +
geom_boxplot(outlier.shape = NA) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
theme(plot.title = element_text(color = "#971a4a", size = 12, face = "bold", hjust = 0.5))+
coord_cartesian(ylim = c(0, 500))
host_data_without_null_host_response_time <- subset(data, host_response_time != "N/A" & host_response_time != "")
price_response_time <- ggplot(data = host_data_without_null_host_response_time,
aes(x = host_response_time, y = price, color = host_response_time)) +
geom_boxplot(outlier.shape = NA) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
theme(plot.title = element_text(color = "#971a4a", size = 12, face = "bold", hjust = 0.5)) +
coord_cartesian(ylim = c(0, 500))
ggarrange(price_response_time,
price_cancellation_policy,
nrow = 1,
ncol = 2,
labels = c("1. ", "2. "))
We can observe no relation in the first graph between the host response time and the price but, on the second graph we can see that the cancellation policy does have an impact on the price depending on its type it’s more or less expensive.
ggplot(data = data, aes(x = instant_bookable, y = price, color = instant_bookable)) +
geom_boxplot(outlier.shape = NA) +coord_cartesian(ylim = c(0, 500))
No clear dependency with this feature.
ggplot(data, aes(availability_over_one_year, price)) +
geom_point(alpha = 0.2, color = "#971a4a") +
geom_density(stat = "identity", alpha = 0.2) +
xlab("Availability over a year") +
ylab("Price") +
ggtitle("Relationship between availability and price")
No clear dependency with this feature.
count_by_host_1 <- data %>%
group_by(host_id) %>%
summarise(number_apt_by_host = n()) %>%
ungroup() %>%
mutate(groups = case_when(
number_apt_by_host == 1 ~ "001",
between(number_apt_by_host, 2, 50) ~ "002-050",
number_apt_by_host > 50 ~ "051-153"))
## `summarise()` ungrouping output (override with `.groups` argument)
count_by_host_2 <- count_by_host_1 %>%
group_by(groups) %>%
summarise(counting = n() %>%
sort(number_apt_by_host,decreasing = T)) # order by nb of apt per host descending
## Warning in if (is.na(nalast)) noNA <- TRUE else if (nalast) noNA <- !
## is.na(vec[length(vec)]) else noNA <- !is.na(vec[1L]): la condición tiene
## longitud > 1 y sólo el primer elemento será usado
## Warning in if (nalast) noNA <- !is.na(vec[length(vec)]) else noNA <- !
## is.na(vec[1L]): la condición tiene longitud > 1 y sólo el primer elemento será
## usado
## Warning in if (is.na(nalast)) noNA <- TRUE else if (nalast) noNA <- !
## is.na(vec[length(vec)]) else noNA <- !is.na(vec[1L]): la condición tiene
## longitud > 1 y sólo el primer elemento será usado
## Warning in if (nalast) noNA <- !is.na(vec[length(vec)]) else noNA <- !
## is.na(vec[1L]): la condición tiene longitud > 1 y sólo el primer elemento será
## usado
## Warning in if (is.na(nalast)) noNA <- TRUE else if (nalast) noNA <- !
## is.na(vec[length(vec)]) else noNA <- !is.na(vec[1L]): la condición tiene
## longitud > 1 y sólo el primer elemento será usado
## Warning in if (nalast) noNA <- !is.na(vec[length(vec)]) else noNA <- !
## is.na(vec[1L]): la condición tiene longitud > 1 y sólo el primer elemento será
## usado
## `summarise()` ungrouping output (override with `.groups` argument)
num_apt_by_host_id <- (ggplot(count_by_host_2, aes(x = "", y = counting)) +
geom_col(aes(fill = factor(groups)), color = "white") +
geom_text(aes(y = counting / 1.23, label = counting),color = "black",size = 4)+
labs(x = "", y = "", fill = "Number of apartments per owner") +
coord_polar(theta = "y"))+
theme_minimal()
superhost <- (ggplot(data) +
geom_bar(aes(x='' , fill=superhost)) +
coord_polar(theta='y') +
scale_fill_brewer(palette="BuPu")) +
theme_minimal()
ggarrange(num_apt_by_host_id,
superhost,
nrow=2,
ncol=1,
align = "hv")
Most of the hosts have only one listing (41548 hosts). There is also a minority of superhosts.
Top 20 hosts in Paris:
count_by_host_3 <- data %>%
group_by(host_id) %>%
summarise(number_apt_by_host = n()) %>%
arrange(desc(number_apt_by_host))
## `summarise()` ungrouping output (override with `.groups` argument)
top_listings_by_host <- count_by_host_3 %>%
top_n(n=20, wt = number_apt_by_host)
knit_print.data.frame <- top_listings_by_host
knit_print.data.frame
## # A tibble: 22 x 2
## host_id number_apt_by_host
## <int> <int>
## 1 2288803 155
## 2 2667370 139
## 3 12984381 91
## 4 3972699 80
## 5 3943828 65
## 6 21630783 65
## 7 39922748 64
## 8 789620 60
## 9 11593703 56
## 10 3971743 55
## # ... with 12 more rows
listings_quarter <- ggplot(data, aes(x = fct_infreq(neighbourhood), fill = room)) +
geom_bar() +
labs(title = "Nb. Listings per city quarter",
x = "Neighbourhood", y = "Nb. of listings") +
theme(legend.position = "bottom",axis.text.x = element_text(angle = 90, hjust = 1),
plot.title = element_text(color = "black", size = 12, hjust = 0.5))
average_prices <- aggregate(cbind(data$price),
by = list(arrond = data$city_quarter),
FUN = function(x) mean(x))
price <- ggplot(data = average_prices, aes(x = arrond, y = V1)) +
geom_bar(stat = "identity", fill = "lightblue", width = 0.7) +
geom_text(aes(label = round(V1, 2)), size=4) +
coord_flip() +
labs(title = "Average daily price per city quarter",
x = "City quarters", y = "Average daily price") +
theme(legend.position = "bottom",axis.text.x = element_text(angle = 90, hjust = 1),
plot.title = element_text(color = "black", size = 12, hjust = 0.5))
ggarrange(listings_quarter,
price,
nrow =1,
ncol = 2,
labels = c("1. ", "2. "))
Top 10 neighborhoods:
data %>%
group_by(neighbourhood) %>%
dplyr::summarize(num_listings = n(), borough = unique(neighbourhood)) %>%
top_n(n = 10, wt = num_listings) %>%
ggplot(aes(x = fct_reorder(neighbourhood, num_listings), y = num_listings, fill = borough)) +
geom_col() +
coord_flip() +
labs(title = "Top 10 neighborhoods by nb. of listings", x = "Neighbourhood", y = "Nb. of listings")
## `summarise()` ungrouping output (override with `.groups` argument)
Rented apartments in the past years:
table <- inner_join(data, R, by = "listing_id")
table = mutate(table, year = as.numeric(str_extract(table$date, "^\\d{4}")))
table["date"] <- table["date"] %>% map(., as.Date)
longitudinal <- table %>%
group_by(date, neighbourhood) %>%
summarise(count_obs = n())
## `summarise()` regrouping output by 'date' (override with `.groups` argument)
time_location <- (ggplot(longitudinal, aes(x = date, y = count_obs, group = 1)) +
geom_line(size = 0.5, colour = "lightblue") +
stat_smooth(color = "#971a4a", method = "loess") +
scale_x_date(date_labels = "%Y") +
labs(x = "Year", y = "Nb. Rented Appartment") +
facet_wrap(~ neighbourhood))
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## Warning: Please use `linewidth` instead.
time_location
## `geom_smooth()` using formula = 'y ~ x'
The most visited and rented locations in Paris are the cheapest ones.
Map representing price range within Paris neighborhoods (higher the closer we are to the center Paris):
height <- max(data$latitude) - min(data$latitude)
width <- max(data$longitude) - min(data$longitude)
paris_limits <- c(bottom = min(data$latitude) - 0.1 * height,
top = max(data$latitude) + 0.1 * height,
left = min(data$longitude) - 0.1 * width,
right = max(data$longitude) + 0.1 * width)
map <- get_stamenmap(paris_limits, zoom = 12)
## i Map tiles by Stamen Design, under CC BY 3.0. Data by OpenStreetMap, under ODbL.
ggmap(map) +
geom_point(data = data, mapping = aes(x = longitude, y = latitude, col = log(price))) +
scale_color_distiller(palette = "BuPu", direction = 1)
table <- inner_join(data, R,by = "listing_id")
table = mutate(table, year = as.numeric(str_extract(table$date, "^\\d{4}")))
res3 <- ggplot(table) +
geom_bar(aes(y =city_quarter ,fill=factor(year))) +
scale_size_area() +
labs( x="Frequency", y="City quarter",fill="Year") +
scale_fill_brewer(palette ="BuPu")
ggplotly(res3)
To have a more clear view of the data, it was decided to use Leaflet to display it. This map is interactive and you can move, click and arrange the display as you wish:
df <- select(data, longitude, neighbourhood, latitude, price)
leaflet(df %>% select(longitude, neighbourhood, latitude, price))%>%
setView(lng = 2.3488, lat = 48.8534, zoom = 12) %>%
addTiles() %>%
addMarkers(clusterOptions = markerClusterOptions()) %>%
addMiniMap()
## Assuming "longitude" and "latitude" are longitude and latitude, respectively
dfsuperhost <- select(data, longitude, neighbourhood, latitude, price)
dfsuperhost <- filter(data, superhost =="t")
leaflet(dfsuperhost %>% select(longitude, neighbourhood, latitude, price))%>%
setView(lng = 2.3488, lat = 48.8534 ,zoom = 12) %>%
addTiles() %>%
addMarkers(clusterOptions = markerClusterOptions()) %>%
addMiniMap()
## Assuming "longitude" and "latitude" are longitude and latitude, respectively
After the analysis of the AirBnB dataset, one can conclude that the majority of the listing are of type entire home/apartment, which is also the most expensive one in comparison to the other room types. The prices depends on the different features of the listing like the cancellation policy, neighborhood located.
Most of the hosts have only one listing but some of them have several, the host with the highest number of listings has 154.
The closer the apartment is to the center of Paris, the more expensive it is. The neighborhood in Paris with the highest number of listings is Butter-Montmartre with 5952 listings, which is also the neighborhood with the highest number of rented apartments in the past years.
People visit more entire home/apartment types of listings, especially in the Butter-Montmartre neighborhood since there are more listings.
Finally, there is a minority of superhosts in comparison to hosts. This is probably because a superhost needs to be more active in the platform and have several clients in a year as well as receive positive feedback from the clients to be evaluated as superhost by AirBnB.